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1. Introduction 

The research on controlled diffusion processes took root in the sixties as a nat- 
ural sequel to the developments in deterministic optimal control on one hand 
and in Markov decision processes on the other. From the former it inherited the 
legacy of 'compactness - lower semi-continuity' arguments for existence of op- 
tima and the Hamilton- Jacobi and (Pontryagin) maximum principle approaches 
to sufficient, resp. necessary conditions for optimality. From the latter it inher- 
ited the basic problem formulations corresponding to different cost functionals 
and more importantly, the notions of adapted (more generally, non-anticipative) 
controls, noisy observations, etc., which are peculiar to the stochastic set-up. As 
the field matured, this union proved to be greater than the sum of its parts and 
has contributed not only to its parent disciplines, but also to the theory of non- 
linear partial differential equations, mathematical finance, etc. In this survey I 
shall attempt to give a comprehensive, though not exhaustive overview of the 
main strands of research in controlled diffusion processes. 

The survey is organized as follows: The next section sets up the basic frame- 
work and solution concepts, defines the different classes of admissible control 
processes, and lists the standard problems in stochastic control classified ac- 
cording to the cost functional. Section 13 describes some motivating examples. 
Section 0] surveys the key results concerning the existence of optimal policies 

*This is an original survey paper. 
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under resp. complete and partial observations. Sectional deals with the charac- 
terization of optimality. The latter sphere of activity is dominated by dynamic 
programming and this is reflected in my write-up as well - comparatively less 
space is devoted to the other important strand, viz., stochastic maximum prin- 
ciple, for which pointers to literature are provided for greater detail. Sectional 
briefly describes the computational issues. Section presents an assortment of 
special topics. 

Throughout the article, I have given a few representative references, making 
no effort to be exhaustive (which would be an impossible task anyway). 

2. The Control Problems 
2.1. Solution concepts 

Throughout what follows, we denote by 'P(S') the Polish space of probability 
measures on a Polish space S with Prohorov topology. C{Z) will correspondingly 
stand for Hhe law of (an S*— valued random variable) Z\ viewed as an element 
of Vis). Also, for any / : 7?.+ S and / C TZ^, f{I) denotes the trajectory 
segment {f{t),t £ I}. 

The basic object of study here will be the d— dimensional {d > 1) controlled 
diffusion process X{-) = [-'i^i(-)7 ' ' ' ,Xd{-)]'^ described by the stochastic differ- 
ential equation 



for t > 0. Here: 

1. for a compact metric 'control space' U, rn{-, •) = [mi{-, ■),•■• ,md{-, ■)]'^ ■ 
TZ!^ xU ^ TV^ is continuous and Lipschitz in the first argument uniformly 
with respect to the second, 

2. (t(-, •) = [[uij{-, ■)]]i<i,j<d ■■Tl'^ xU ^ jidxd jg Lipschitz in its first argu- 
ment uniformly with respect to the second, 

3. is an T^''— valued random variable with a prescribed law ttq, 

4. W{-) = [Wi{-), • • • , Wd(-)]"^ is a d— dimensional standard Brownian motion 
independent of Xq, 

5. u{-) : TZ^ U is the 'control process' with measurable paths, satisfying 
the non-anticipativity condition: for t > s > 0, W{t) — W{s) is independent 
of {Xo,W{y),u{y),y < s}. (In other words, u{-) does not anticipate the 
future increments of W{-).) 

We shall say that |^ is non- degenerate if the least eigenvalue of (t(-, •)(t'^(-, •) 
is uniformly bounded away from zero, degenerate otherwise. The two solution 
concepts for that we shall consider are: 

1. Strong solution: Here we assume Xq, Vl^(-), u(-) to be given on a prescribed 
probability space {fl,J-,P) and consider the corresponding X{-) given by 




(1) 



V.S. Borkar/ Controlled diffusions 



215 



That there wiU be an almost surely unique X{-) can be proved by 
standard arguments using the Ito-Picard iterations as in (S^, Ch. 4. 
2. Weak solution: Here we assume that only the law of the pair {X{-), 
is prescribed and consider any {X{-),u{-),W{-), Xq) on some probability 
space conforming to the above prescription. 'Uniqueness' then is inter- 
preted as uniqueness in law. 

These are exact counterparts of the corresponding notions for uncontrolled dif- 
fusions. Define 

Lf{x, u) {\/f{x), mix, u)) + itr {a{x, u)<j^{x, u)\/'f{x)) (2) 
for / e C'^{TZ^). We may write L„/(x) for Lf{u,x), treating it as a parameter. 

def 

Let {J^t} denote the natural filtration of {X{-),u{-)), i.e., JFf = the comple- 
tion of ns>t<^iX{y),u(y),y < s). The weak solution is then equivalent to the 
following 'martingale' formulation: 

For any bounded twice continuously differentiable f : TZ'^ TZ with bounded 
first and second order partial derivatives, f{X{t)) — Lu(s)fiX{s))ds, t > 0, is 
a martingale w.r.t. {J-t}- 

It helps to think of the strong solution as the engineer's world view wherein 
W{-) is the noise input to a black box along with the chosen input u(-), leading 
to the 'output' X{-). The weak solution on the other hand represents the statis- 
tician's viewpoint in which one 'fits' the equation |^ to the known processes 
{X{-),u{-)) with W{-) being the noisy 'residuals'. 

2.2. Control classes 

The class of u{-) enunciated above is the most general class of controls that we 
shall consider, to be referred to as non-anticipative controls. Let {J^^} denote 
the natural filtration of X{-). Obviously, u{-) is adapted to {J-*}. We shall say 
that u(-) is a feedback control if it is also adapted to {T^}, i.e., u{t) at each t is 
a function of the observed trajectory X([0,t]). We shall say that it is a Markov 
control if in addition u{t) = v{t, X{t)), t > 0, for a measurable v : TZ'^ x T?.** U. 
Finally, we say that it is a stationary Markov control if u{t) — v{X(t)),t > 0, 
for a measurable v : TZ'^ — > U. 

We shall also need the relaxation of the notion of control process u{-) above 
to that of a relaxed control process. Here we assume that U = 'P{Uo) where Uq 
is compact metrizable (whence so is U) and m; (•,•), 1 < i < d, are of the form 

mi{x,u) = J fhi{x,y)u{dy), 1 < i < d, 

for some rhi : TZ"^ x Uq ^ TZ that are continuous and Lipschitz in the first 
argument uniformly w.r.t. the second. Similarly, cr(-, •) will be assumed to be of 
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the form [[ct^ (x, m)]] = the nonnegative definite square-root of 

/ ^ix,y)a{x,y)u{dy) 



for a : TZ''' xUg ^ j^dxd satisfying continuity / Lipschitz conditions akin to those 
for (T. See [111) , pp. 132-134, about a discussion of the choice of square-root in 
the uncontrolled case, similar remarks apply here. In addition, we assume that 
all functions of the form f{x,u),x G Tl'^,u G U, appearing in the cost criteria 
described below are of the form 



for some f -.W^ xUo^n satisfying the same regularity or growth conditions 
as those stipulated for /. We may write u{t) — u{t, dy) to underscore the fact 
that it is a measure- valued process. Then the original notion of t/o— valued con- 
trol Mo(-) (say) corresponds to u{t, dy) = 5ua(t)(dy), the Dirac measure at iio(i)) 
for all t. We call such controls as precise controls. Precise feedback, Markov 
or stationary Markov controls may be defined accordingly. Intuitively, relaxed 
control generalizes the notion of randomized controls in discrete time problems, 
but this interpretation has to be treated with care: unlike in the discrete time 
case, we cannot have independent randomization at each t, as that would lead 
to measurability problems. A better picture is to view dt x u{t, dy) as a mea- 
sure on 7?,+ X Uo- The set of relaxed controls is then the closure under weak* 
topology of the measures dt x 6ua{t){dy) corresponding to precise controls. In 
this sense, relaxed controls achieve the compactification and convexification of 
precise controls, which in turn form a dense subset therein. Unless mentioned 
otherwise, we shall work with the relaxed control framework. This notion was 
introduced in deterministic control by L. C. Young |117| and generalized to the 
stochastic case by Fleming It is a genuine relaxation in the sense that the 
corresponding joint laws of contain those corresponding to precise 

controls as a dense subset and therefore for most cost functionals of interest, 
the infimum over the latter equals the infimum over the former. The latter is 
often a minimum thanks to the compactification implicit in the relaxation. 

Given (Q) with u{-) a relaxed control, one can replace u{-),W{-) in it by a 
{t(-), W{-) where u(-) is feedback and W{-) is another standard Brownian motion. 
In fact, ■u(-) is defined simply by / fdu{t) = E[J fdu{t)\!F^] for / in a countable 
subset of Cb{Uo) that separates points of [/ = V{Uq) and t > Q. Conversely, 
if {X{),W{-)) are given on a probability space (ri,JF, P) and a weak solution 
(X'(-), m'(-), W'{-)) of (P) is available on some probability space with C{Xq) = ttq 
and w'(-) feedback, then this can be replicated in law by an (X(-), u(-), H^(-)) 
on a possibly enlarged {^,J^,P) with {Xq,W{-)) as given. See P- 18, for 
details in case of a without explicit control dependence. Extension to the more 
general case discussed here is straightforward. While the more flexible notion of 
weak solutions is usually the preferred one in dealing with controlled diffusions, 
the foregoing allows us to go back and forth between the strong and weak 
formulations to some extent. 
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In the non-degenerate case, has a unique strong solution for a Markov 
control V when a{-,v{-)) is Lipschitz |115| . which in particular includes the case 
when there is no explicit control dependence in a. The Lipschitz requirement on 
(t(-, f (•)) can be relaxed to mere measurability for one and two dimensional prob- 
lems along the lines of pp. 192-194. (These results have been established 
for the case of bounded coefficients, but can be extended to, say, a 'linear growth' 
condition using a standard localization argument.) Also, the resulting processes 
can be shown to be strong Feller. On the other hand, in the non-degenerate case 
always has a unique weak solution for feedback controls when tr does not 
have explicit control dependence and is Lipschitz [HSI- If f does have explicit 
control dependence and the control is stationary Markov, existence of a solution 
can be established (jZE), p. 86-91), but not its uniqueness (HHI- See, however, the 
results of jBTi which show that under the non-degeneracy hypothesis, the prop- 
erty of having a unique strong solution is generic in a precise sense. (See also 
|77] for some instances where uniqueness is available.) In the degenerate case, 
neither existence nor uniqueness of either weak or strong solution is assured 
for general measurable controls. Under continuity (resp. Lipschitz) condition on 
m(-, t;(-)), (j(-, ■(;(•)), existence (resp. existence and uniqueness) of weak (resp., 
strong) solutions can be established even in the degenerate case |111| . 

Much of the literature on controlled diffusions does not include control in 
the diffusion matrix a{-). There are some nontrivial reasons for this. The first is 
that for stationary Markov controls u{-) = z)(X(-)), one is in general obliged to 
consider at best measurable f (•). As mentioned above, for a merely measurable 
diffusion matrix, even in the non-degenerate case only the existence of a weak 
solution is available. If one further allows explicit time dependence in ct, either 
through the control or otherwise, Lebesgue continuity of transition probabilities 

def 

can be a problem _43_ . Also, for a relaxed control process /i G [/with cr{x, /i) — 
J cf{x, ■)dfj,, L^f above needs to be defined as 



which can lead to problems of interpretation. (In situations where one can show 
that an optimum precise control exists, one can work around this problem.) This 
is not to say that the case of control-dependent diffusion matrix has not been 
studied. There are several situations, such as mathematical finance, where the 
control dependence of a cannot simply be wished away. Hence there has been a 
large body of work on problems with control-dependent drift. For example, the 
p.d.e. issues related to the HJB equations we mention later for problems with 
control-dependent diffusion matrix have been studied extensively in [J], j76| . 
More recently, Chinese mathematicians working in this area have developed an 
impressive body of work for this general case |116| . 




and not as 
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2.3. Cost structures 

Let /e, c G C(7^'' xU),hG C(7^'^), g e C(7^'^ xn'^),qG C{U x [/), be prescribed 
functions with at most linear growth in the space (i.e., x G TV^) variable. Also, 
c > 0. Furthermore, in continuation of our relaxed control framework, k, c are 
of the form k{x,u) = J k{x,y)u{dy),c{x,u) = J c{x,y)u{dy), resp., for suitable 
k,cG C{TZ''' X Uo)- Some standard cost functional are: 

1. Finite horizon cost: For T > 0, minimize 

E[ r e- /o + e" r 

Jo 

(3) 

Here c is the discount function (discount factor if it is constant), k the so 
called 'running cost' function and h the terminal cost function. 

2. Cost up to exit time: For an open set D C TZ"^ with a smooth boundary 
dD (more generally, boundary satisfying the 'exterior cone condition') and 

def 

T = min{t >0:X{t)^ D}, minimize 

E[ r e- /o -(^(-)Ms))dej^^^^^^^ ^^^^^^^ ^ /; c(X(s)Ms))de^^^^^^^^^ 

Jo 

3. Infinite horizon discounted cost: For c(-, ■) > S > 0, minimize 

E[ e-Jo^(^(^)'"(^»'^^fc(X(t),w(i))dt]. (5) 

^0 

This is popular in business applications where discounting is a real phe- 
nomenon and not merely a mathematical convenience. 

4. Average or 'ergodic' cost: Minimize 

limsup;^; / E[k{X{t),u{t))]dt (6) 

T^oo T Jq 

(the average version), or a.s. minimize 

1 

limsup- / k{X{t),u{t))dt (7) 
r^oo J Jo 

(the 'almost sure' version). These are popular in engineering applications 
where transients are fast, hence negligible, and one is choosing essentially 

from among the attainable 'steady states'. 

5. Risk- sensitive cost: Minimize 

E[e^o KX{t),u(t))dt+h(x{T))^^ 



or 



limsup log£[e 
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where a > is a parameter. This cost functional has the advantage of 
involving 'all moments' of the cost, which matters when mere mean can 
be misleading. It also arises naturally in finance applications where com- 
pounding effects inherent in the formulation lead to the exponentiation in 
the cost PJj. Risk-sensitive control also has interesting connections with 
'robust' control theory 

6. Controlled optimal stopping: Minimize 

E[ r e- /o ^(^(^)'"(^»''^fc(X(0, «(t))di+e~ ^0 -(^(-)."(-))'i«/,(x(T))] (10) 
Jo 

over both admissible u(-) and all stopping times r > 0. The 'finite horizon' 
variation of this replaces r above by r A T for a given T > 0. 

7. Impulse control: Here one is allowed to reset the trajectory at stopping 
times {Ti} from X{Ti~) (the value immediately before r^) to a new (non- 
anticipative) value X{Ti), resp., with an associated cost g{X{Ti), X(Ti—)). 
The aim is to minimize 

E[ r /o ^(^(^^■"(^»''^fc(X(t), M(t))dt + 
Jo 

e~r''=<'''^^'"'^»%(X(T,),X(T,-)) + e"r^(^(^)'"(^»'^^ft(X(T))], 

Ti<T 

(11) 

over admissible w(-), reset times {r^}, and reset values {X{Ti)}. Assume 
g > S for some d > to avoid infinitely many jumps in a finite time 
interval. 

8. Optimal switching: Here one is allowed to switch the control u(-) at stop- 
ping times {Ti} from u{Ti—) (the value immediately before Ti) to a new 
(non-anticipative) value u{Ti), resp., with an associated cost q{u{Ti), u{Ti—)). 
The aim is to minimize 

E[ r e- So "(^(^■)'"(^»''^fc(X(i), w(t))dt + 
Jo 

E~ f^^ c{X{s),u{s))ds I I \ I \\ I - \ c{X(s),u(s))ds , , ^ ,rji\\-l 
e Jo q{u{n),u{n~)) + e Jo h{X{T))\, 

Ti<T 

(12) 

over reset times {r^}, and reset values {u{Ti)}. Assume q > 5 for some 
(5 > to avoid infinitely many switchings in a finite time interval. 

Infinite horizon discounted or ergodic versions of impulse and switching con- 
trol can also be considered (see, e.g., [HSIj EIH|)- The hybrid control problem 
studied in [23 combines the last two above and more. 
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3. Examples 

Here we sketch in brief some recent applications of controlled diffusions from 
literature. The description is necessarily brief and the reader is referred to the 
original sources for more details. 

1. Forest harvesting problem f^; 

In this problem, the so called 'stochastic forest stand value growth' is 
described up to extinction time 7 by 

X{t)^x+ f n{X{s))ds+ f a{X{s))dW{s)- V Cfc, 

where 7 = inf{< > : X{t) < 0} (possibly 00) and the non-negative, 
non-anticipative random variables {t^}, {Cfc}, are respectively the cutting 
times and the quantities cut at the respective cutting times. The aim is 
to maximize the forest revenue E[^^^^^ e^^'^''{X{Tk) — c)], where c > 
is the reforestation cost and r > the discount factor. This is an impulse 
control problem. 

2. Portfolio optimization [73^ ; 

The wealth process in portfolio optimization satisfies the s.d.e. 

dX{t) = X{t)[{T:{t)^{t) + (1 - 7r(t))r{t))dt + Tr{t)a{t)dW{t), 

where /x(-),cr(-) are known and 7r(-) is the [0, 1]— valued control process 
that specifies the fraction invested in the risky asset, the remaining wealth 
being invested in a bond. Here r(-) is a fluctuating interest rate process 
satisfying 

dr{t) ^ a{t)dt + bdW'{t). 

Both a(-),6 are assumed to be known and W'{-) is a Brownian motion 
independent of W{-). The aim is to maximize E[X{T)'^] for some T, 7 > 0. 
( [78| considers a somewhat more general situation.) An alternative 'mean- 
variance' formulation in the spirit of Markowitz seeks to maximize a linear 
combination of the mean and negative variance of X{T) |119| . A 'risk- 
sensitive' version of the problem, on the other hand, seeks to maximize 

liininf-Aiogi?[e-(2/e)X(T)]_ 

T|oo uT 

See for a slightly more general formulation. 

3. Production planning llt^ : 

Consider a factory producing a single good. Let ?/(•) denote its inventory 
level as a function of time and p(-) > the production rate. ^ will denote 
the constant demand rate and resp. the factory-optimal inventory 
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level and production rate. The inventory process is modelled as the con- 
trolled diffusion 

dv(t) = {p(t) - Odt + adW{t), 

where tr is a constant. The aim is to minimize over non-anticipative p{-) 
the discounted cost 

/•OO 

E[ / e-"*[c(p(t) - pif + h{y{t) - v^f]dt] 
Ja 

where c, h are known coefficients for the production cost and the inventory 
holding cost, resp. 
4. Heavy trajfic limits of queues '.62j : 

The following control problem arises in the so called Halfin-Whitt limit of 
multi-type multi-server queues : Consider a system of d customer classes 
being jointly served by N identical servers, with Xi,fii,ji denoting the 
respective arrival, service and per customer abandonment rates for class 
i. Let Zi = iK/fJ-i)/J2ji^j/l^j)'^ < i < d. In a, suitable scaled limit 
(the aforementioned Halfin-Whitt limit), the vector of total number of 
customers of various classes present in the system satisfies the controlled 
s.d.e. 

dX{t) = b{X{t),u{t))dt + Y.dW{t), 

where the i—th. component of b{x, u) is hi{x, u) = —O/ii — ji{xi — Ui) — jjiiUi 
and S = diag[y/2jlizi, • • • , ^/2jIdZd\- The parameter 9 has the interpre- 
tation as the excess capacity of the server pool in a suitable asymptotic 
sense. The action space is state-dependent and at x, is 

U{x) = {u (^W^ : u < x,^Ui = Xi) A 0}. 

i i 

The i—th component of the control, Ui(t), will correspond to a scaled limit 
of the number of servers assigned to the class i at time t. The aim is to 
minimize the cost 

/•OO 

E[ e-''*c{X{t),u{t))dt] 
Jo 

for a discount factor a > 0, where c{x, u) = + liPi){xi — Ui). Here 

hi, Pi are resp. the holding cost and the abandonment penalty for class i. 

4. Existence results 

4.1. Complete observations 

Early existence theory in controlled diffusions was clearly motivated by the ex- 
isting 'compactness-continuity' arguments from deterministic optimal control. 
The latter were based on establishing the sequential compactness of attainable 
trajectories of the state-control pairs in an appropriate function space and then 
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establishing the continuity (more generaUy, lower semi-continuity) of the cost 
functional on it, whence the minimum was guaranteed to be attained. The first 
extensions of this approach considered the non-degenerate case without explicit 
control dependence in cr, under complete observations (i.e., the process X{-) is 
observed by the controller) and the finite horizon cost. Thus the £(X(-)) re- 
stricted to a finite time interval were absolutely continuous w.r.t. the law of 
the corresponding zero drift process, with the Radon-Nikodym derivative given 
by the Girsanov theorem. Establishing uniform integrability of these Radon- 
Nikodym derivatives, one obtained their relative sequential compactness in the 
cr(Li, Loo) topology by the Dunford-Pettis theorem. After establishing that ev- 
ery limit point thereof in this topology was also a legal Girsanov functional for 
some controlled diffusion, this was improved to compactness [H], 0, |SZ|- (See 
|T7). |5n) for some precursors which use more restrictive hypotheses.) ^Hl gives 
an ingenious argument to improve this to the existence of optimal Markov con- 
trols. [HI] took a different approach based on establishing compactness of laws 
of the controlled processes in the space of probability measures on the trajectory 
space. While this is completely equivalent to the above for the non-degenerate 
case with control-independent cr, it provided a more flexible technique insofar 
as it could be extended to the degenerate case, control-dependent a, infinite 
dimensional problems, etc. 

The existence of optimal Markov controls can be read off the above for the 
case c(-, •) = a constant, simply from the fact that the one dimensional marginals 
of any controlled diffusion can be mimicked by another with a Markov control. 
This was first proved for the non-degenerate case J^Ij ISH] and later extended 
to the degenerate case ^^l- See [21] for similar results. To handle more general 
costs, it helps to view them as expectations with respect to appropriately defined 
'occupation measures'. For example, the infinite horizon discounted cost 

POO 

E[ e-"''k{X{t),u{t))dt] 
Jo 

{a > 0) can be written as J kd^ where the 'discounted occupation measure' ^ 
is defined by: 

J /d^"- i?[^"e-"*y f{Xit),y)uit,dy)dt] 

for / G CiiiTZ'^ X Uq). This, of course, depends on the initial law which is assumed 
to be fixed. The set of attainable /i can be shown to be convex compact and 
in the non-degenerate case, one can show that each element thereof can be 
realized by a stationary Markov control (i.e., each /x can be mimicked by a 
stationary Markov control). In view of the lower semi-continuity of the map 
H —>■ J kdjjL, the desired existence result follows. This approach was initiated in 
and carried out further in [21]. (In fact, one can show that the extreme 
points of this set correspond to precise stationary Markov controls, see the 
discussion of the ergodic control problem below.) In the degenerate case, such a 
'mimicry theorem' for occupation measures seems unavailable, but the existence 
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of an optimal Markov (for finite horizon problems) or stationary Markov (for 
discounted infinite horizon problem or control up to exit time) controls can 
be established by adapting Krylov's Markov selection procedure f' llll . Ch. 12). 
This was done in 40 , 63 following a suggestion of Varadhan. Another variation 
of the above, applicable to the degenerate case, looks at equivalence classes of 
C{X{-),u{-)) whose marginals agree a.e. and shows that the extremal equivalence 
classes in fact correspond to Markov controls ^23; . See 24 for a further variation 
on this theme. 

Throughout the foregoing, as one might expect, one has to weaken 'stationary 
Markov' to 'Markov' if the cost and / or the drift and / or the diffusion matrix 
of (0) have explicit time dependence. Also, for Uq C 7?.™, the compactness 
assumption on C/q can be dropped by penalizing large ||u(t)||, e.g., by including 
the term ^||u(i)|p in the running cost. 

The occupation measure approach is most successful for the ergodic control 
problem. This has been studied mostly for the case when a does not have explicit 
control dependence, because of the possible non-uniqueness of solutions under 
stationary Markov controls when it does. (More generally, one would have to 
work with 'the set of all solutions' for a stationary Markov control rather than the 
solution.) Consider the non-degenerate case first. Let ■;;(•) be a stationary Markov 
control such that the corresponding X{-) is positive recurrent and therefore has 
a unique stationary distribution rf e 'P(Tl'^). Define the corresponding ergodic 

def 

occupation measure as ^x" [dx, dy) = ?]" {dx)v{x, dy). One can show that the set 
G of attainable /z^'s is closed convex with its extreme points corresponding to 
precise stationary Markov controls. We can say much more: define the empirical 
measures {vt} by: 

J fdi^t j'^ j f{X{s),y)u{s,dy)ds, f e CbiW x Uo),t > 0. 

Let n ^ TZ'^U {oo} = the one point compactification of TZ and view vt as 
a random variable in 'P{'JZ x U) that assigns zero mass to {oo} x U. Then as 
t oo, 

z/t ^ {i^:iy{A)=ai''{An{{oo}xU)) + (l-ay'{An{TZ'^xU))y A 
Borel in X Uq, with a e [0, 1], e VHoo} x U), v" £ G} 

almost surely. This allows us to deduce the existence of an optimal precise sta- 
tionary Markov control for the 'a.s.' version of the ergodic control problem in two 
cases: (i) Under a suitable 'stability' condition (such as a convenient 'stochastic 
Liapunov condition') that ensures compactness of G and a.s. tightness of {ut} , 
or (ii) a condition that penalizes escape of probability mass to infinity, such as 
the 'near-monotonicity condition': 

liminf min k{x,u) > /?, 

||£C||— >00 U 

where f3 = the optimal cost (221. The latter condition is often satisfied in 
practice. The 'average' version of the ergodic cost can be handled similarly. 
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As always, the degenerate case is much harder. Here one shows that, as 
in the non-degenerate case, G is characterized as {/i : / Lfd^i = 0} for / €E 
a sufficiently rich class of functions in (??.''). That a /x g G would satisfy 
/ Lfdfi = is easy to see for the stipulated /. The hard part is the reverse 
implication: One shows that there exists a stationary pair {X{-),u{-)) that has 
H as its marginal. This extends an important result of [H^ to the controlled case 
[rOg| . See [TH], [TU] for some extensions. This characterization helps establish G 
as a closed convex set, leading to existence of optimal ergodic pairs {X{-), u{-)) 
under suitable (somewhat stronger) stability or near-monotonicity conditions 
|21| . This can be refined to an optimal stationary Markov control by means of a 
limiting argument using Krylov selection for the discounted cost as the discount 
factor approaches zero 16 . 

It should be mentioned that in the non-degenerate case, one often has classical 
solutions to the associated HJB equation as we see later and the existence of an 
optimal precise stationary Markov (or Markov in the finite horizon case) control 
can be read off the HJB theory. Thus direct existence results described above 
at best give some additional insight, except in some 'non-classical' situations 
like the constrained problems we encounter later. The 'occupation measure' 
viewpoint above is also the basis for the linear programming approach we discuss 
later. In the degenerate case, however, there is significant motivation to pursue 
these. 

Finally, we note that such 'direct' existence results are also possible for more 
general problems involving impulsive and switching controls, etc. See, e.g., |29) . 

4-2. Partial observations 

This corresponds to the situation where there is another m— dimensional ^ob- 
servations^ process Y{-) given by 

Y{t)= [ b{X{s))ds + W'{t), t>0, 
Jo 

where b : TZ'^ ^ 7^™ is Lipschitz and W'{-) is an m— dimensional standard 
Brownian motion independent of W{-). W'{-) corresponds to (integrated) 'ob- 
servation noise', as opposed to the 'signal noise' W{-). The situation when the 
two are not independent is called the 'correlated noise' case and has also been 
studied in literature. The objective is to optimize one of the above cost func- 
tionals over all control processes u(-) adapted to the natural filtration of Y{-), 
denoted {^Y}- We shall call these strict sense admissible controls, to contrast 
them with wide sense admissible controls to be defined later. 

The correct 'state' (or 'sufficient statistics') process in this case turns out 

def 

to be the regular conditional law nt of X{t) given Qt — the right-continuous 
completion of a{Y{s),u{s), s < t) for t > Q. (For strict sense admissible u(-), 
this is the same as {Tj}.) This is a (??.'')— valued process whose evolution 
is described by the Fujisaki-Kallianpur-Kunita equation of nonlinear filtering 
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described as follows: Let v{f) J fdv for any non-negative measure v on TV^ 

and / e Cbijl'^). Then for / e C^iW^) ^ the space of twice continuously 
differentiable / : TV^ — > TZ which vanish at infinity along with their first and 
second order partial derivatives, one has 



Mf) = MI)+ Tis{Lu(s)f)ds+ (7r,(6/)-7r,(6V,(/),dr(s)). (f3) 
Jo Jo 

Here the so called 'innovations process' 

Y{t)''=^ Y{t)- / TTs{b)ds, t>0, 



is an m— dimensional standard Brownian motion independent of {Xq, W{-)) and 
generating the same filtration as y(-) J^. 

- def 

Let J-'t = the right-continuous completion of 

(7{X{s),Y{s), u{s), W{s), W'{s), s<t) 

for t > 0. Let {fl, !F, P) denote the underlying probability triple where T = Wt^t 
without loss of generality. Define a new probability measure Pq on (fi, T) by: 

^ dg J^{b(X{s)),dY{s))-^ /J MYism^ds ^ ^ Q 

By Girsanov's theorem, under Pq, Y{-) itself is an m— dimensional standard 
Brownian motion independent of (^oi Define the process of unnormalized 

conditional laws fj,t,t > 0, taking values in A4{TZ'^), the space of non-negative 
measures on TZ"^ with the weak* topology, as follows: 

Mf)=^Eo[f{x{t))At\gt] 

for a countable collection of / G Cb{TZ'^) that separates points of A4{TZ'^), 
Eo[ ■ ] being the expectation under Pq. This evolves according to the Duncan- 
Mortensen-Zakai equation 

A*t(/)-/io(/)+ / iis{L^is)f)ds+ j {^is(bf),dY{s)) (14) 
Jo Jo 

for / € Cq(TZ'^). This has the advantages of linearity and the fact that viewed 
under Pq, Y(-) itself is the driving Brownian motion, /zt, t > is interconvertible 
with TTj, i > 0, through: 

Ml) = 

Mt(l) 

= ^^^fyi:i^sibUYis))^iJ^\\.Ab)\\'ds^ 
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Here 1 is the constant function identically equal to 1. Thus /it is an equivalent 
state variable. The first equality above justifies the adjective 'unnormalized'. 
Yet another equivalent state variable is defined by the process v^t: ^ ^ Oj given 

by 

Thus /^t(/) — (^t(e^'''^'^/). Suppose that is twice continuously differentiable. 
By an 'integration by parts' argument, {ipt} is seen to evolve according to 

Vt(/) = </'o(/) + / v{Lu{s),sf)ds, (15) 
Jo 

for /e C2(7^^). Here, 

Lu,sf{x) Lufix)- {Wf{x),a{x)<j^{x)Db^{x)Y{s)) 

+{^{Y{s),Db{x)a{x)a^{x)Db'^{x)Y{s)) 

-{Y{s),Dh{x)m{x,u)+l{x)) - \\\b{xW)f, 

where Db is the Jacobian matrix of b and ti{x) ^tT:{a'^ {x)V'^bi{x)a{x)) for 
1 < i < m. p5|l is an ordinary parabolic p.d.e. (as opposed to the stochas- 
tic p.d.e.s H13fl and 114|) 'l with the sample path of F(-) appearing as a random 
parameter. Hence this is called the pathwise filter. Standard p.d.e. theory en- 
sures the well-posedness of the pathwise filter, from which that of the Fujisaki- 
Kallianpur-Kunita and Duncan-Mortensen-Zakai filters may be deduced using 
the conversion formulas [^S] . 
Note that for / e C^,(7^'* x [/), 

E[f{x{t)Mm = E[Kt{f{-Mm] 

^ i?o[</Pt(e<^(*)>''(-»/(-,ii(i)))]. 
Thus, for example, the finite horizon cost 

E[ [ k{X{s), u{s))ds + h{X{T))] (16) 

equals 

E[f 7rt{k{;u{s)))ds + TTT(h)] (17) 
T 

Eo[f ^it{H■,u{s)))ds + ^iTm (18) 
Jo 

J, 

Eo[f (^t(e<^(^)'^(-»fc(-,u(s)))ds + (^T(e<^(^)''(-)>/i)]. (19) 
Jo 
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Hence the control problem of minimizing H16|l under partial observations can 
be viewed as the equivalent problem of controlling the 7'(7?.'^)— valued (resp., 
A^(7^'')— valued) process {ttj} (resp., {^tljiv?*}) with cost ifTTj) (resp., ((TH|) . 
H19|) ). These are called separated control problems because they 'separate', i.e., 
compartmentalize the two issues of state estimation and control. When TTt or 
any of the equivalent state variables can be characterized by finite dimensional 
'sufficient statistics', this can be reduced to a finite dimensional control problem. 
Such instances are rare, but include the important case of linear systems with 
linear observations (i.e., m, 6 are linear and a a constant), Gaussian ttq, and a 
quadratic cost. Here ttj is Gaussian and is completely characterized by its first 
two moments, corresponding to first two conditional moments of the state given 
observations. 

For discounted cost ^ with discount function c, one replaces ttj by 7ft in 

^ and CH), where n{f) =^ £;[e^/o "(^("^'"(""''^(^(t))!^*] for / e CbiW^). 
Correspondingly, replace L„ by Lu ~ c(-, u) in \l?t\ . Similar adjustments can be 
made for ifT^ and (fT5|l . 

The existence of optimal strict sense admissible control for this problem re- 
mains an open issue. The best known result is the existence of an optimal 
wide sense admissible control j48j. Say u(-) is wide sense admissible if for each 
t > 0, Y{t + -)-Y{t) is independent of {Xo,W{-), {u{s),Y{s), s < t}) under Pq. 
This clearly includes strict sense admissible controls and can be shown to be 
a valid relaxation thereof in the sense that the infimum of the cost over either 
set is identical. The proof technique for the existence claim is based on weak 
convergence arguments for measure-valued processes akin to the complete ob- 
servations case and exploits the fact that wide sense admissibility is preserved 
under convergence of joint laws of the processes concerned. A refinement based 
on Krylov's Markov selection procedure leads to the existence of an optimal 
Markovian control (i.e., one that at each time depends on the current value of 
the measure- valued filter) for the separated control problem |41) . 

Similar developments are possible for For one replaces {nt} with a 
suitably modified measure- valued process that is supported on D (2Qj. For er- 
godic control, the separated control problem can be formulated for {71-4, u{t),t > 
0} exactly as above and the existence theory is analogous to that for the degen- 
erate diffusions under complete observations described above modulo additional 
technicalities JS], ^Bj. For risk-sensitive control, one needs a modification of 
the measure-valued process along the lines described above for taking care of 
the discount function c. 

5. Characterization of Optimal Controls 
5.1. HJB equation - the classical case 

We begin with the dynamic programming principle, which is usually the pre- 
ferred approach to characterization of optimality in controlled diffusions (see, 
e.g., IHH). To start with, consider the non-degenerate case. Consider for example 
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the finite horizon cost. Define the 'value function' 

V{x,t) = '^^iE[j^ e" St ''^^^'^'''^'^^'^'k{X{y),u{y))dy 

where the infimum is over all admissible controls. Then by the standard dynamic 
programming heuristic, for t' > i, 

V(x,t) = ^T^iE[J^ e"./'*''''^<'''"^'^''''fc(X(y),u(y))dy 

In words, if one is at point x at time t, then the 'minimum cost to go' is the 
minimum of the sum of the cost paid over [t, t'] plus the minimum cost to go 
from time t' onwards. Let i' = t + A for some small A > 0. Then 

V{x,t) « ME[k{X{t),u{t))A + 

^-cix{t)Mt))'^V{X{t + A),t + A)\X{t) = x]. 

Thus 

ME[kiX{t),uit))A + e-c(^(t),"(t))A-t/(^(i + A), 

t + A) -V{x,t)\X{t) = x]/A w 0. 

Assuming sufficient regularity of V, letting A formally leads to 
dV 

— h min(fc(a;, u) + (V^Ffx, t), mix, u)) — c(x, u)Vix, t) 

at u 

+ ]^iY{a{x,u)a'^{x,u)VlV{x,t))) = 0, (20) 

where V^^jV^ denote the gradient and the Hessian in the x variable. This is 
the HJB equation for the finite horizon control problem, with the boundary 
condition V[x,T) — h{x) V x. 

The above is an instance of how the dynamic programming heuristic is used 
to guess the correct HJB equation. The equation is then analyzed by invoking 
the standard p.d.e. theory. For example, under appropriate boundedness and 
regularity conditions on m,a,k,h,c, H20(l has a unique solution in the Sobolev 
space Wp'^{TZ''' x [0,T]) for any p, 2 < p < oo. When a does not have exphcit 
control dependence, this is a quasi-linear p.d.e. as opposed to a fully nonlinear 
one and the existence of a unique solution can be established in the class of 
bounded / : TZ"^ x TZ^ TZ which are twice continuously differentiable in 
the first variable and once continuously differentiable in the second I86j. In 
either case, that this solution indeed equals the value function follows by a 
straightforward argument based on Krylov's extension of the Ito formula f |7B). 
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p. 122). Implicit in this is the foUowing 'verification theorem': A Markov control 
v.W^ X [0, T] ^ [/ is optimal if and only if 

v{x,t) £ Aigmm^{LuV{x, t) + k{x,u)) a.e. (21) 

The existence of a measurable v{-) satisfying the above follows from a standard 
measurable selection theorem |1()2| . 

For control up to exit time, it makes obvious sense to define 

Jo 

+e-r^(^(^)'"(^»'^^/i(X(T))|X(0) = x], 

the infimum being over all admissible controls. There is no explicit time depen- 
dence in V because the 'possible futures till r' look the same from a given state 
regardless of when one arrived there. A heuristic similar to the above leads to 
the HJB equation 

min(fc(a;, u) ~ c(x, u)V{x) + LV{x, u)) = (22) 

u 

with V{x) = h{x) for x G dD. This has a unique solution in i^^{D) n C{D) 
inSI- A verification theorem for optimal stationary Markov controls along the 
lines of 1)21(1 can be established. 

For the infinite horizon discounted cost, the HJB equation for 

V{x) =^ iniE[ / e-io "(^(^)'"(^»'^^fc(X(t), u(t))dt|X(0) = x] (23) 

is ((22|l on the whole space and for k bounded from below, the 'value function' 
defined above is its least solution in W'^i^^iJZ'^). An appropriate verification 
theorem holds. In both this and the preceding case, 'W^ ^^^^ can be replaced by 
'C^' in the quasi-linear case corresponding to control-independent cr(-)- 

The situation for ergodic control is more difficult. Let denote the V of 

(123 when c EE a constant a > 0. Define V°' ''^ - F"(0). Then satisfies 
min(fc(a;, u) - aV°'ix) - aV^iO) + LV"{x, u)) = 0. (24) 

u 

Under suitable technical conditions (such as near-monotonicity or stability con- 
ditions mentioned above) one can show that as a | 0, V"{-) and aV°'{0) converge 
along a subsequence to some V, /? in resp. an appropriate Sobolev space and 7?,. 
Letting a [ along this subsequence in ((24|l . these are seen to satisfy 

min(fc(x, u) — (3 + LV{x, u)) = 0. 

u 

This is the HJB equation of ergodic control. One can show uniqueness of (3 
as being the optimal ergodic cost and of V up to an additive scalar in an 
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appropriate function class depending on the set of assumptions one is working 
with. A verification theorem holds pS] . 

For risk-sensitive control, the HJB equations are 

dV 

min(— — h kix, u)Vix, u) + LV(x, u)) ~ 
M at 

in the finite time horizon case and 

mm{{k{x, u) - X*)V{x, u) + LV{x, u)) = 

u 

in the infinite time horizon case. One usually needs some technical restrictions on 
fc, in particular so that the cost is in fact finite under some control. It has been 
found more convenient to transform these HJB equations by the logarithmic 
transformation (f> = — log V. The <f> thus defined satisfies the so called Hamilton- 
Jacobi-Isaacs equation, the counterpart of HJB equation for two person zero sum 
stochastic differential games, with finite horizon, resp. ergodic payoffs of the type 
discussed earlier [10(1. This transformation, a descendant of the Cole-Hopf 
transformation that links Burgers equation to the heat equation, was introduced 
and effectively used by Fleming and his collaborators not only for risk-sensitive 
control, but also for several interesting spin-offs in large deviations. See, e.g., 
ISni, Ch. 6. 

For controlled optimal stopping, the HJB equation gets replaced by the quasi- 
variational inequalities: 

min {k{x, u) — c{x,u)V{x) + LV{x,u)) > 0, 

u 

h{x) - V{x) > 0, 

min {k{x, u) - c{x,u)V{x) + LV{x,u)) {h{x) - V{x)) = 0. 

u 

These are a slight generalization of variational inequalities appearing in obstacle 
problems and elsewhere in applied mathematics. The intuition behind these is 
as follows: If it is optimal not to stop in a neighborhood of x, it reduces to the 
earlier control problem and the HJB equation must hold, i.e., the first inequality 
above is an equality. If it is optimal to stop at x, the minimum cost to go, 
V{x), must equal the cost on stopping, h{x), i.e., the second inequality above 
is an equality. In either case, standard dynamic programming heuristic suggests 
that the appropriate inequality above must hold always. Clearly one of the two 
equalities must hold at any given point x, which leads to the third equality. 
The situation for impulse control is similar: 

fdV \ 
mm [ -—(x,t) + k(x,u) — c(x,u)V(x,t) + LV(x,t,u) ] > 0, 
u \ dt J 

mm{V{y,t)+g{y,x))-V{x,t) > 0, 
y 

fdV \ 

mm — — (x, t) -\- k(x, u) — c(x, u)V{x, t) + LVlx, t,u) ] x 
u \ at J 

(mm{V{y, t) + g{y, x)) ~ V{x, t) 
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Likewise for optimal switching, we include the control variable 'u' in the state 
(thus the value function V has three arguments: x, u and <), and consider: 

> 0, 

> 0, 



= 0. 

See jl2| for an extensive treatment of applications of variational and quasi- 
variational inequalities in stochastic control. A more probabilistic treatment of 
optimal stopping is found in |106| . See |103| for some recent contributions to 
optimal switching. 

In each case above, the appropriate verification theorem holds. Note also 
that the verification theorem, coupled with a standard measurable selection 
theorem (see, e.g., |lU2j l guarantees an optimal precise Markov or stationary 
Markov control (as applicable) . This is because the respective minima are in fact 
attained at Dirac measures. See for the inequalities for 'stochastic hybrid 
control'. 

5.2. HJB equation - the degenerate case 

In the degenerate case, the HJB equation typically does not have classical solu- 
tions. This has lead to the notion of viscosity solutions that provides a unique 
characterization of the desired solution within a larger class (typically, that of 
continuous functions). We shall describe this notion in the case of infinite hori- 
zon discounted cost problems. 

Say that ^ is a viscosity solution of (|24() if for any -0 e C^{TZ'^), 

• at each local maximum of V — ^, 

min(fc(a;, u) — c{x, u)V{x) + Lil){x, u)) > 0, 

u 

and, 

• at each local minimum of V — ip, 

min(A:(x, u) — c(x, u)V{x) + Lipix, u)) < 0. 

u 

To see why this makes sense in the first place, note that if V were C^, then 
at each local maximum of V — ip the gradients of V, would be equal and 
the Hessian of V — ^ would be negative definite. Thus if V satisfied the HJB 
equation, {V,i/j) would satisfy the first inequality above at this point. A similar 
logic applies to the second statement. 



— — (x, u, t) -I- k(x, u) — c{x, u)V(x, u, t) + LVix. t, u) 
at 

min(y(x, y, t) + q{y, u)) - V{x, u, t) 

V 

fdV \ 
( "^(^^j ^) + ^{^^ ~ ^(ic, u)V{x, u, t) + LV{x, U,t) j X 

(mm{V{x, y, t) -f q{y, u)) - V{x, u, t)\ 
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Note that if one were to add a term v/SV to LV , > 0, in H24|l . then it would 
be the HJB equation corresponding to replacing a{-) by a{-)a'^{-) + vid-, Id 
being the d x d identity matrix. This is non-degenerate and thus has a classical 
solution Vi, as described above. The viscosity solution is the limit of these as 
V I 0. The term v/SV appears in equations of fluid mechanics as the 'viscosity' 
term, hence the terminology. An alternative equivalent definition of viscosity 
solutions is possible in terms of sub-differentials 89 - 9f . 

The value function can be shown to be the unique viscosity solution of the 
HJB equation in an appropriate function class for a wide variety of control 
problems [Sill, [HHI-inj- ESli 112] i for the corresponding development 

in case of variational inequalities. 

This leaves open the issue of a verification theorem wherein the utility of this 
approach finally resides. While this is not as routine as in the non-degenerate 
case, recent work using non-smooth analysis has made it possible |118| . 

We mention now two abstractions of the dynamic programming principle 
which led to the HJB equations. The first is the martingale dynamic program- 
ming principle formulated first in (written in 1974, though published much 
later) and developed further in [SJ, |1(J4| . For the finite horizon problem above, 
this reduces to the observation that 

V{X{t))+ f LV{X{s),u{s))ds, t£[0,T], 
Jo 

is a submartingale w.r.t. {J-^t} and is a martingale if and only if u{-) is optimal. 
Similar statements can be formulated for the other problems. The second ap- 
proach is the nonlinear semigroup developed in jlUlj . This is the semigroup of 
operators 

Stf min£;[ Te-Zo 

Jo 

where the minimum is over all admissible controls. Under our hypotheses, this 
can be shown to be a semigroup of positive nonlinear operators Cb{Tl'^) 
Cb{TZ'^) which is the lower envelope of the corresponding Markov (linear) semi- 
groups corresponding to constant controls u{-) = a e [/, in a precise sense. The 
associated infinitesimal generator has the form 

Lf = min(L/(a;, u) + k{x, u) — c{x, u)f). 

u 

The above are resp. the controlled counterparts of the 'martingale problem' 
and the 'semigroup approach' in Markov process theory, and have the advantage 
that they generalize naturally to more abstract semimartingale, resp. Markov 
process control problems. 
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5.3. The stochastic maximum principle 

There has also been a considerable body of work on extending the theory of 
necessary conditions for optimality based on the Pontryagin maximum principle 
from deterministic optimal control to the stochastic setting. The earliest effort 
in this direction is perhaps |85| . It may be recalled that the maximum principle 
involves an 'adjoint process' which evolves backward in time with a given termi- 
nal condition. Since stochastic control comes with the additional baggage of the 
'arrow of time' specified by the increasing filtration and associated adaptedness 
/ nonanticipativity issues, this extension is nontrivial and much hard work went 
into it. See, e.g., [HII, which was a landmark contribution in this domain, and 
the references therein. The advent of 'backward stochastic differential equations' 
provided a natural framework for handling this, culminating in the very general 
stochastic maximum principle (for the finite horizon problem) reported in jll6| . 
A typical b.s.d.e. is of the form 



with the terminal condition Y{T) ~ ^. Here, for = the natural filtration 
of W{-), ^ is a prescribed square integrable random variable measurable with 
respect to Tj^. A solution is a pair of {jFj*^}— adapted processes Y{-), Z{-) such 
that 



Under a Lipschitz condition on /i, a unique solution can be shown to exist in 
a suitable class of {jFj^}— adapted processes Chapter 7). See for an 

extensive account of coupled forward-backward stochastic differential equations 
and their applications to stochastic control and mathematical finance. See also 



A special case of the stochastic maximum principle, for a independent of con- 
trol, is as follows. Assume that m, a, k, h are bounded, twice continuously differ- 
entiable in the space (cc) variable with the first and second order partial deriva- 
tives satisfying the Lipschitz condition. We confine ourselves to {J-'j'^}— adapted 
controls u(-). Let p{-),q{-) = feilOkzCOl ' ' ' be processes adapted to the 

natural filtration of W{-) and satisfying the backward stochastic differential 
equation 



dY{t) ^ h{t, Y{t), Z{t))dt + Z{t)dW{t), t e [0, T], 




m 



dp{t) 



(V,m(X(t), u{t)fpit) + V.:,a\X{t)fq,{t) 



\/^k{X{t),u{t)))dt + q{t)dW{t), t e [0, T], 



(25) 



with the terminal condition p{T) — —V xh{x{T)) . Here cr-'(-) denotes the j— th 
column of cr(-). Under stated conditions, (|25|l can be shown to have an a.s. 
unique solution (p(-),g(-)). The process p(-) is the desired adjoint process. The 
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maximum principle then states that if is an optimal pair, then one 

must have 

(p(i), m{X{t), u{t))) - k{X{t),u{t)) = max m{X{t), u)) - k{X{t),u)) 

U 

for a,, e.t £ [0, T]. In fact, the full statement of the stochastic maximum principle 
in is much more general, allowing for a controlled diffusion matrix a. 

Comparing with the verification theorem of dynamic programming, one would 
expect p{t) above to correspond to ~\/ {X {t) , t) . This may be shown under 
very strong conditions. More generally, a relationship along these lines has been 
established in jll8| (see also |116| ). 

5.4- Partial observations 

The dynamic programming principle under partial observations is usually de- 
rived by moving over to the 'separated' control problem of controlling the asso- 
ciated nonlinear filter. In the simpler cases, the 'integral' form of the dynamic 
programming principle is easy to justify. For example, for the finite horizon 
problem, define the value function 

V{n,t) "^^ m:mE[j^ TTs{k{-,u{s)))ds + nT{h)\Trt = tt], 

where the minimum is over all wide sense admissible controls. This satisfies; for 
A > 0, 

rt+A 

V{7rt,t) =mmE[J nsik{-,u{s)))ds + ViTTt+A,t + A)\nt = tt], 

with the minimum attained if and only if w(s)|sG[t,t+A] is an optimal choice. 
Analogous statements can be made for the unnormalized law as the state vari- 
able. To get a 'differential' form of this principle in terms of an HJB equation 
is hard, as the state space, ViJZ'^) or Al(7^'^) (alternatively, the more popular 
£2(71'^) when a square integrable density for the conditional law is available), is 
infinite dimensional. This has been approached through the theory of viscosity 
solutions for infinite dimensional p.d.e.s [^3, [1111 • As for the more abstract ver- 
sions, the martingale formulation of the dynamic programming principle for the 
separated control problem is a straightforward counterpart of the completely 
observed case. See, however, |^ for a different development which derives a 
martingale dynamic programming principle in a very general set-up (see also 
ing). The Nisio semigroup has been developed in |2S!, ESI- See [H2I, [lEl for 
some recent developments in the stochastic maximum principle under partial 
observations and |7] for an early contribution. 

6. Computational issues 

Stochastic control problems with elegant explicit solutions tend to be few. There 
are, however, some notable exceptions of great practical importance, such as 
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the celebrated 'Linear-Quadratic-Gaussian' problem with linear state dynamics 
and quadratic cost, which has become standard textbook material |3T]. More 
often than not the controlled diffusion problems do not lead to explicit analytic 
solutions and one has to resort to approximations and numerical computations. 
This has led to much research in approximation and computational issues. We 
briefly survey some of the main strands of this research. 

One popular method has been to consider controlled Markov chain approxi- 
mations to controlled diffusions, thereby moving over to discrete time and dis- 
crete state space. One then analyzes the resulting discrete problem by standard 
schemes available for the same. See fo^' extensive account of a rigorous 
theory for this well developed approach. |HS| contains some recent extensions of 
this approach to stochastic differential games. 

The other important approach considers the infinite linear program implicit 
in the occupation measure based approach and uses linear programming tools 
(see, e.g., . [fifij ) . The ensuing linear program, however, is infinite dimensional 
and its approximation by a finite linear program is needed |97| . 

The HJB equation, being a nonlinear p.d.e., is open to numerical techniques 
developed for the same. The most important recent developments on this front 
are the ones propelled by the viscosity solutions revolution that use stability 
results for viscosity solutions for rigorous justification. See, e.g., 0]. 

The recent developments in simulation-based approximate dynamic program- 
ming pi], however, have not caught on in controlled diffusion literature to a 
large extent, but there is considerable interest in the finance community for 
such 'Monte Carlo' techniques - see, e.g., 

For numerical analysis of stochastic differential equations in general, |75] is 
the standard source. A good source for 'Monte Carlo' for diffusions is |87| . 

7. Other problems 

Here we list some other subareas of controlled diffusions that will not be dis- 
cussed at length here. Only a brief description is given, with some representative 
references. 

1. Singular control: These are problems involving an additive control term 
in the stochastic 'integral' equation that is of bounded variation, but not 
necessarily absolutely continuous with respect to the Lebesgue measure. 
That is. 



where A{-) is the control. Typically it can be 'local time at a boundary' 
that confines the process to a certain bounded region. This research orig- 
inated in heavy traffic limits of controlled queues [S]], |1()7) . See [2], j.'i5| . 
HHi [Z^; [Zlj for some recent contributions and applications to finance. 
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2. Adaptive control: This concerns the situation when the exact model of 
the controlled system is not known and has to be 'learnt on line' while 
controlling it. Several alternative approaches to this problem exist in the 
discrete time stochastic control literature, but the only one that seems to 
have been followed to any significant extent in controlled diffusions is the 
'self-tuning' control [221 j In this, one enforces a separation of esti- 
mation and control by estimating the model by some standard statistical 
scheme (usually parametric), and at each time using the control choice 
that would be optimal for that time and state if the current estimate 
were indeed the correct model. This runs into the usual 'identifiability' 
problem: several models may lead to control choices that in turn lead to 
identical output behavior, making it impossible to discriminate between 
these models. Many variations have been suggested to work around this 
problem, such as additional randomization of controls as 'probes'. 

3. Control of modified diffusions and control with additional constraints: Is- 
sues similar to those of the preceding section have been explored for re- 
flected diffusions (which often arise as heavy traffic approximation of 
controlled queues [HI]), diffusions with 'jumps' or switching modes |54) . 
|88|. etc. Another related development is control under additional con- 
straints [2H- Here the controller seeks to minimize one cost functional 
subject to a bound on one or more ancillary cost functional. 

4. Multiple timescales: These are problems wherein different components of 
the controlled diffusion move on different time-scales, as in: 

dXi{t) = m^^\Xi{t),X2{t),u{t))dt + a^^\Xi{t),X2{t))dWi{t), 
dX2{t) = -m'^^\Xi{t),X2{t),u{t))dt + ^a^^\Xi{t),X2{t))dW2{t), 

where e > is 'small'. This implies in particular that X2{ ) operates on a 
much faster time-scale than ^i(-). Intuitively, one would expect ^2(0 to 
see ^i(-) as quasi-static, whereas Xi{-) sees X2{-) as almost equilibrated. 
This intuition is confirmed by the analysis which allows one to analyze 
Xi(-) with its dynamics averaged over the asymptotic behavior (read 'sta- 
tionary distribution' in the asymptotically stationary case) of X2{-) when 
the latter is analyzed by freezing the ^i(-) in its dynamics as though it 
were a constant parameter [H^, |82) . 

5. Game problems: These are the problems that involve more than one con- 
troller with possibly different costs. The simplest is the two person zero 
sum case where two controllers have cost functionals that sum to zero, i.e., 
the cost of one is the reward of the other. The key result in this case is 
the minmax theorem which establishes the existence of a value, equalling 
both the minimum of the maximum (over the opponent's choices) cost 
paid by the first and the maximum of the minimum (over the opponent's 
choices) reward gained by the other. This then is characterized by the ap- 
propriate Hamilton-Jacobi- /saacs equation for the value function, which 
corresponds to replacing the 'min' in the HJB equation by 'minmax' or 
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'maxmin'. The more general iV— person noncooperative case has N con- 
trollers with different cost functionals. This is more complicated and one 
looks for a Nash equilibrium, i.e., a control policy profile for the controllers 
whereby no single controller can lower her cost by choosing differently if 
the rest don't change their controls. This leads to a coupled system of 
HJB equations, coupled through the minimizing controls of each other. 
See Uni, EHI, ESI for a sampler. 
6. Mathematical finance: This has proved to be a rich source of problems in 
stochastic control in recent years, e.g., in option pricing, portfolio opti- 
mization, etc. We have already seen some examples in Section|21 The area 
is still exploding and merits a separate full length review. See [SIli |Z3> 
for a perspective and 021, |SH], |M| for a sample of recent contribu- 
tions. 

What next? To mention a few of the current themes, the most prominent 
of course remain the problems emerging from mathematical finance and heavy 
traffic limits of queues. Risk-sensitive control is another area which still offers 
interesting open problems, as are control of degenerate diffusions and control 
under partial observations. Extensions to infinite dimensional problems also 
present several challenges of a technical nature. The biggest challenge, however, 
is on the computational front. Fast and accurate computational schemes are 
sought in particular by the finance community. 
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